Large Scale Distributed Multiclass Logistic Regression

نویسندگان

Pengtao Xie

Jin Kyu Kim

Eric P. Xing

چکیده

Multiclass logistic regression (MLR) is a fundamental machine learning model to do multiclass classification. However, it is very challenging to perform MLR on large scale data where the feature dimension is high, the number of classes is large and the number of data samples is numerous. In this paper, we build a distributed framework to support large scale multiclass logistic regression. Using stochastic gradient descent to optimize MLR, we find that the gradient matrix is computed as the outer product of two vectors. This grants us an opportunity to greatly reduce communication cost: instead of communicating the gradient matrix among machines, we can only communicate the two vectors and use them to reconstruct the gradient matrix after communication. We design a Sufficient Vector Broadcaster (SVB) to support this communication pattern. SVB synchronizes the parameter matrix of MLR by broadcasting the sufficient vectors among machines and migrates gradient matrix computation on the receiver side. SVB can reduce the communication cost from quadratic to linear without incurring any loss of correctness. We evaluate the system on the ImageNet dataset and demonstrate the efficiency and effectiveness of our distributed framework.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Distributed Machine Learning via Sufficient Factor Broadcasting

Matrix-parametrized models, including multiclass logistic regression and sparse coding, are used in machine learning (ML) applications ranging from computer vision to computational biology. When these models are applied to large-scale ML problems starting at millions of samples and tens of thousands of classes, their parameter matrix can grow at an unexpected rate, resulting in high parameter s...

متن کامل

Modified Logistic Regression: An Approximation to SVM and Its Applications in Large-Scale Text Categorization

Logistic Regression (LR) has been widely used in statistics for many years, and has received extensive study in machine learning community recently due to its close relations to Support Vector Machines (SVM) and AdaBoost. In this paper, we use a modified version of LR to approximate the optimization of SVM by a sequence of unconstrained optimization problems. We prove that our approximation wil...

متن کامل

An efficient model-free estimation of multiclass conditional probability

Conventional multiclass conditional probability estimation methods, such as Fisher’s discriminate analysis and logistic regression, often require restrictive distributional model assumption. In this paper, a model-free estimation method is proposed to estimate multiclass conditional probability through a series of conditional quantile regression functions. Specifically, the conditional class pr...

متن کامل

Distributed training of Large-scale Logistic models

Regularized Multinomial Logistic regression has emerged as one of the most common methods for performing data classification and analysis. With the advent of large-scale data it is common to find scenarios where the number of possible multinomial outcomes is large (in the order of thousands to tens of thousands) and the dimensionality is high. In such cases, the computational cost of training l...

متن کامل

Distributed Newton Method for Regularized Logistic Regression

Regularized logistic regression is a very successful classification method, but for large-scale data, its distributed training has not been investigated much. In this work, we propose a distributed Newton method for training logistic regression. Many interesting techniques are discussed for reducing the communication cost. Experiments show that the proposed method is faster than state of the ar...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1409.5705 شماره

صفحات -

تاریخ انتشار 2014

Large Scale Distributed Multiclass Logistic Regression

نویسندگان

چکیده

منابع مشابه

Distributed Machine Learning via Sufficient Factor Broadcasting

Modified Logistic Regression: An Approximation to SVM and Its Applications in Large-Scale Text Categorization

An efficient model-free estimation of multiclass conditional probability

Distributed training of Large-scale Logistic models

Distributed Newton Method for Regularized Logistic Regression

عنوان ژورنال:

اشتراک گذاری